Background intensities from array 3 and 6 (labeled PM101_3 and Ctr_3), which is from the same cell culture, have some probes with high values if compared to the other arrays. This can be considered an artifact due some random techinical problem.
In order to check the impact of this possible artifact, two types of analysis will be made and compared:
kable(summary(raw$Eb), align='l', caption='Summary of the raw background signal')
| PM101_1 | PM101_2 | PM101_3 | Ctr_1 | Ctr_2 | Ctr_3 | |
|---|---|---|---|---|---|---|
| Min. :20.0 | Min. :28.00 | Min. : 25.00 | Min. :20.00 | Min. :19.00 | Min. : 18.00 | |
| 1st Qu.:36.5 | 1st Qu.:37.00 | 1st Qu.: 36.00 | 1st Qu.:29.00 | 1st Qu.:27.00 | 1st Qu.: 24.50 | |
| Median :38.0 | Median :39.00 | Median : 38.00 | Median :31.00 | Median :29.00 | Median : 26.00 | |
| Mean :38.0 | Mean :38.54 | Mean : 39.86 | Mean :31.43 | Mean :28.73 | Mean : 26.28 | |
| 3rd Qu.:40.0 | 3rd Qu.:40.00 | 3rd Qu.: 40.00 | 3rd Qu.:34.00 | 3rd Qu.:31.00 | 3rd Qu.: 28.00 | |
| Max. :50.0 | Max. :60.00 | Max. :22275.50 | Max. :46.00 | Max. :40.00 | Max. :3910.00 |
To see the image intensities from the arrays, values will be log2 transformed to a better visualization.
par(mfrow=c(2,3))
for(i in 1:6){
y[j] <- log2(raw$Eb[,i])
imageplot(y, raw$printer)
}
The very britgher spots on arrays 3 and 6 hide the true signal, we can see the difference if these probes are filtered.
par(mfrow=c(2,3))
for(i in 1:6){
y[j] <- log2(rawf$Eb[,i])
imageplot(y, raw$printer)
}
The max intensity value excluding these two arrays is 60, so we can see how many probes are highly expressed.
array3 <- raw[which(raw$Eb[,3]>60),]
table <- as.data.frame(cbind(array3$Eb[,3], array3$genes$GeneName,array3$genes$SystematicName))
colnames(table) <- c('Intensity', 'GeneName','SystematicName')
rownames(table) <- NULL
kable(table, caption='Array 3')
| Intensity | GeneName | SystematicName |
|---|---|---|
| 13199 | C22orf39 | NM_173793 |
| 22275.5 | SLC12A7 | NM_006598 |
| 289 | RACGAP1P | NR_026583 |
| 69 | SRGN | NM_002727 |
| 11104 | PDE11A | NM_001077358 |
| 11104 | A_24_P212949 | A_24_P212949 |
| 62.5 | NUMBL | NM_004756 |
| 86 | GPR110 | NM_153840 |
| 5789 | ZNF567 | NM_152603 |
| 14039 | BMPR1A | NM_004329 |
| 3240 | CST1 | NM_001898 |
| 16327 | SCAND1 | NM_016558 |
| 311.5 | TMC3 | ENST00000359440 |
Array PM101_3 have 13 possible outlier probes ***
array6 <- raw[which(raw$Eb[,6]>60),]
table <- as.data.frame(cbind(array6$Eb[,6], array6$genes$GeneName,array6$genes$SystematicName))
colnames(table) <- c('Intensity', 'GeneName','SystematicName')
rownames(table) <- NULL
kable(table, caption='Array 6')
| Intensity | GeneName | SystematicName |
|---|---|---|
| 2599 | THC2654448 | THC2654448 |
| 3910 | AK096129 | AK096129 |
| 3170 | CELA3B | NM_007352 |
| 147 | DA274457 | DA274457 |
Array Ctr_3 have 4 possible outlier probes ***
And then see the p.values from these genes(in the not filtered analysis)
out <- raw[ which(raw$Eb[,3]>60 | raw$Eb[,6]>60) ,]
artifact <- exprs[which(exprs$SystematicName %in% out$genes$SystematicName),]
rownames(artifact) <- NULL
kable(artifact[,c(9, 8, 16)]) #columns: SystematicName, GeneName and adj.P.Value
| SystematicName | GeneName | adj.P.Val |
|---|---|---|
| NM_173793 | C22orf39 | 0.0826749 |
| NM_016558 | SCAND1 | 0.1235352 |
| NM_002727 | SRGN | 0.2305743 |
| NM_153840 | GPR110 | 0.2821158 |
| NM_001898 | CST1 | 0.4246042 |
| A_24_P212949 | A_24_P212949 | 0.4302023 |
| AK096129 | AK096129 | 0.4355608 |
| THC2654448 | THC2654448 | 0.4412561 |
| NM_001077358 | PDE11A | 0.4426612 |
| NM_152603 | ZNF567 | 0.4426612 |
| DA274457 | DA274457 | 0.4447238 |
| ENST00000359440 | TMC3 | 0.4577694 |
| NM_007352 | CELA3B | 0.4762738 |
| NM_006598 | SLC12A7 | 0.6013146 |
| NM_004329 | BMPR1A | 0.7836932 |
| NM_004756 | NUMBL | 0.9390481 |
| NR_026583 | RACGAP1P | 0.9459319 |
As we can see, the only genes which has been ranked differentialy expressed is NUMBL and BMPR1A (adj.p.value < 0.05).
The following methods were used to perform both of the analysis:
Background correction using the method normexp:
bgc <- backgroundCorrect(raw,method='normexp')
Normalization between the arrays using the quantile method:
norm <- normalizeBetweenArrays(bgc,method='quantile')
Filtering control probes:
eset <- norm[norm$genes$ControlType==0,]
Averagin replicated probes:
eset <- avereps(eset,ID=eset$genes[,"SystematicName"])
Create the linear model:
f <- factor(targets$Condition, levels = unique(targets$Condition))
design <- model.matrix(~0 + f)
colnames(design) <- levels(f)
contrast.matrix <- makeContrasts(contrasts='PM101-Ctr', levels=design)
fit <- lmFit(eset$E, design)
Compute empirical bayes statistics:
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
All Probes
Filtered Probes
Boxplot of the foregrounds intensities have no change
The normalized boxplot is quite different between the analysis, the second diminishes the number of outliers values.
We can see the difference between the two analysis. All Probes Up regulated: 4685 Down regulated: 4633
Filtered Probes Up regulated: 4769 Down regulated: 4826
par(mfrow=c(1,2))
volcanoplot(exprs$logFC, exprs$adj.P.Val, rank, title='PM101-Ctr All Probes')
volcanoplot(exprs2$logFC, exprs2$adj.P.Val, rank2, title='PM101-Ctr Filtered Probes')
diff <- exprs[exprs$adj.P.Val<0.05,] diff2 <- exprs2[exprs2$adj.P.Val<0.05,]
From the design experiment with all probes, table(diff2\(ProbeName %in% diff\)ProbeName)